-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ci): handle disk mounting and logs reading edge-cases #7690
Conversation
Temporarily disabled the `set -e` option around the docker logs command to handle the broken pipe error gracefully. Handle more complex scenarios in our `Result of ${{ inputs.test_id }} test` job
ebd31aa
to
cce1a9b
Compare
The sync-to-checkpoint test won't run any more because we have a checkpoint disk now. So it won't run on PRs or in the scheduled jobs. I think the There might also be arguments we can pass to |
tee
to read container logslaunch
and run
edge-cases
@teor2345 yeah, at least I was able to capture the errors before the checkpoint disk got fully build and the job started to get skipped. |
Let's see if the error happens on other jobs? If it doesn't then maybe we can lower the priority of this PR and ticket. |
It's happening here too: https://github.com/ZcashFoundation/zebra/actions/runs/6450509996/job/17510016301#step:4:240 |
It might be worth trying |
It looks like That means all the other modes won't work either. But |
If this does not work try `(tee … || true)`
This should be good to go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's a lot better.
Unfortunately this PR failed in the merge queue due to issue #7659, so it might help to focus on fixing that issue first. |
Yeah, I'll now focus on #7659 |
launch
and run
edge-casesThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, sync-to-checkpoint test passed here: https://github.com/ZcashFoundation/zebra/actions/runs/6455786703/job/17524400741
That's the sync-past-checkpoint test, this PR also needs a full sync to verify it works. @gustavovalverde did you want to run a full sync now, or wait until Friday? |
@teor2345 last Friday it failed, so I triggered a new one here: https://github.com/ZcashFoundation/zebra/actions/runs/6460865308 |
Running https://github.com/ZcashFoundation/zebra/actions/runs/6460865308 failed, as that's basically the only test not needing a cached state and thus the only one which was not tested in this PR 😞 It should be fixed now in: I'm running a Full sync from that branch to confirm: https://github.com/ZcashFoundation/zebra/actions/runs/6461129131 |
Motivation
Some tests are still failing with
tee: 'standard output': Broken pipe
, recently it has been more frequently with thesync-to-checkpoint
test, but it could happen on other tests too.Also, mounting errors are still happening, and we'd like to capture the
dmesg
when this happens atlaunch
, for better debugging. Other options likeset -x
will also help debugging and replicating some commands locally.Edit:
In the end we had this different outputs for disks from the same VM:
Based on the preceding output, we can see that GCP is mounting the disks in different order on each VM. This is why the
/dev/sda
and/dev/sdb
devices are not consistent across the VMs, nor across reboots on the same VM.Fixes #7564
Fixes #7659
Fixes #7614
Solution
For
tee: 'standard output': Broken pipe
:shell: /usr/bin/bash -exo pipefail {0}
in GitHub jobs, for tests deployed to GCPgrep
anddocker wait
trap
quotes correctly--command
quoting for GCP testsFor
failed to mount local volume: mount /dev/sdb device or resource busy
:To work around this we're extracting the disk to be use, based on the device name we set in previous GitHub Action steps.
Review
sync-to-checkpoint
test.Reviewer Checklist